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Abstract 

Glyph-based visualization is an effective tool for depicting multivariate information. Since sorting is one of the 
most common analytical tasks performed on individual attributes of a multi-dimensional data set, this motivates 
the hypothesis that introducing glyph sorting would significantly enhance the usability of glyph-based visualiza- 
tion. In this paper, we present a glyph-based conceptual framework as part of a visualization process for interactive 
sorting of multivariate data. We examine several technical aspects of glyph sorting and provide design principles 
for developing effective, visually sortable glyphs. Glyphs that are visually sortable provide two key benefits: 1) 
performing comparative analysis of multiple attributes between glyphs and 2) to support multi-dimensional visual 
search. We describe a system that incorporates focus and context glyphs to control sorting in a visually intuitive 
manner and for viewing sorted results in an Interactive, Multi-dimensional Glyph (IMG) plot that enables users to 
perform high-dimensional sorting, analyse and examine data trends in detail. To demonstrate the usability of glyph 
sorting, we present a case study in rugby event analysis for comparing and analysing trends within matches. This 
work is undertaken in conjunction with a national rugby team. From using glyph sorting, analysts have reported 
the discovery of new insight beyond traditional match analysis. 



1. Introduction 

Sorting large, multi-dimensional data is a growing consen- 
sus in modern data acquisition and processes where the or- 
dering of data is an integral part of many applications and 
disciplines, ranging from the analysis of scientific informa- 
tion (e.g., using graphs and charts), to enhancing the effi- 
ciency of algorithms. Such records are traditionally sorted 
analytically in a data-driven manner (e.g., via spreadsheets), 
where users perform sorting on individual attributes of a 
multi-dimensional data set. This is a non-trivial task due 
to the vast possible permutations of sorting which greatly 
impacts the expressiveness in high dimensional visualiza- 
tions [YPWR03]. When data must be ordered using a high 
level of sorting, it reveals two important challenges: 1) how 
the data is organised, and 2) the ordering of sort keys, which 
can not be easily observed by viewing large tables of data. 

Glyphs (sometimes known as icons) are graphical entities 
that convey one or more data values using visual features 
such as size, shape and colour. This significantly improves 



perception of data characteristics and is well suited for de- 
picting high-dimensional, multivariate data [War02]. Cher- 
noff Faces [Che73] and Star Glyphs [SFGF72] are some 
examples of multivariate glyphs where identifying glyphs 
with similar features is effective, but cognitively challenging 
when determining the ordering of glyphs. Thus, such glyphs 
are not visually sortable in an obvious way. This becomes 
a greater challenge when glyphs are unorganised. Figure 1 
demonstrates how ordering such glyphs in a given spatial 
configuration is more informative in revealing multivariate 
trends. Glyph sorting is one approach for performing interac- 
tive sorting of multivariate data as part of a visualization pro- 
cess. As an data exploration mechanism, interactive sorting 
in visualization provides the following additional objectives: 
1) making observations about data patterns (e.g., clusters and 
distributions) in relation to a sorted variable and stimulating 
hypotheses about other variables. 2) performing analytical 
tasks and visual evaluation of hypotheses, such as what vari- 
ables may affect the ordering of a specific variable. 
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Figure 1: Visual representation of two example multi-dimensional glyphs, namely (a) Star glyphs and (b) Bar chart glyphs 
when glyphs on the left are unordered, in comparison to glyphs on the right which are ordered to two sorting parameters. 
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In this paper, we present a novel glyph-based sorting 
framework to drive and facilitate interactive sorting of data 
in a visual and intuitive manner. We describe a set of de- 
sign principles (Section 4) for mapping attributes to visu- 
ally sortable glyphs. This significantly enhances the usabil- 
ity of glyph-based visualizations for both comparative anal- 
ysis of multi-variate data and for supporting visual search. 
In Section 5, we present an interactive system for the ex- 
ploration of glyph-based visualization. Novel features of the 
system include a focus and context glyph-based user inter- 
face (Section 5.1) to control high-dimensional sorting and 
viewing sorted results in a Interactive, Multi-dimensional 
Glyph (IMG) plot (Section 5.2). We extend traditional axis 
mapping using hierarchical axis binning (Section 5.3). This 
enables visual depiction of multiple sort key parameters in 
space, which is effective for reducing visual clutter in the 
IMG plot view. To demonstrate the effectiveness of glyph- 
sorting, we present a real-world case study of rugby event 
analysis. The work is carried out in close collaboration with 
an international rugby team, in which we developed a glyph- 
sorting software tool for use by the coaching analysts. As 
a result of glyph sorting, the analysts uncovers new insight 
and knowledge for match analysis. The main contributions 
of this paper are: 

• The introduction and development of high-dimensional, 
focus and context glyphs that are visually sortable to sup- 
port sorting of multi-variate data. 

• A novel glyph-based, interactive system for controlling 
high-dimensional sorting and viewing sorted results. 

• A hierarchical axis binning method for encoding multiple 
dimensions onto a single axis. This effectively reduces vi- 
sual clutter by relaxing the positioning of glyphs. 

• An evaluation of the effectiveness of glyph sorting in a 
real- world case study of sports event analysis. 

2. Related Work 

Sorting is the computational process of rearranging a se- 
quence of items into ascending or descending order [Knu98]. 



Many sorting algorithms have been proposed, including bub- 
ble sort by Demuth [Dem56], merge sort by von Neu- 
mannr [Knu98], and quick sort by Hoare [Hoa62]. Since best 
and worse case performance runtime can vary drastically 
with such algorithms, further research continues to propose 
new sorting techniques [BFCM06] and adaptive approaches 
that utilise ordered data [ECW92]. Our work is not focused 
on a faster sorting algorithm per say, but combining the ben- 
efits of sorting with glyph-based visualization. 

Glyph-based visualization is an established technique 
for depicting multi-dimensional data sets. The survey by 
Ward [War02, War08a] provides a technical framework for 
glyph-based visualization, covering aspects of visual map- 
ping and layout methods, as well as addressing important is- 
sues such as bias in mapping and interpretation. Ropinski et 
al. [RP08] present an in-depth survey on the use of glyph- 
based visualization for spatial multivariate medical data. 
Glyphs are widely used in other application areas, such as 
DT-MRI visualization [LAK*98, WMM*02], unsteady flow 
visualization [HLNW1 1] and activity recognition [BBS*08]. 
Lie et al. [LKH09] describe a general pipeline for visualiz- 
ing scientific data in 3D using glyphs and introduce design 
guidelines such as the orthogonality of individual attribute 
mappings. Pearlman et al. [PRdJ07] use a glyph-based mul- 
tivariate visualization to understand depth and diversity of 
large data sets. Chlan and Rheingans [CR05] use 2D and 
3D glyph-based multivariate visualization to show distribu- 
tion within the data set. Janicke et al. [JBMC10] introduce 
SoundRiver, that depict audio/video events from movies us- 
ing glyphs for visualization on a timeline. Previous to this 
study, Legg et al. [LCP*12] conducted a design study to 
show the effective use of glyph-based visualization within 
sports performance analysis. A fundamental difference here 
is that we use glyphs that are visually sortable. 

Interactive visualization studies the ability of human in- 
teraction for exploring and understanding datasets through 
visualization, which Zudilova et al. [ZSAL08] covers in a 
state-of-the-art report. De Leeuw and Van Wijk [dLvW93] 
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is one earlier research which incorporates glyphs into in- 
teractive visualization for analysing multiple flow charac- 
teristics in selected regions using a probe glyph. Shaw et 
al. [SEK*98] describe an interactive glyph-based framework 
for visualizing multi-dimensional data, where attributes are 
mapped in order of data importance to visual cues such as lo- 
cation, size, colour and shape. To our knowledge, this is the 
first work of its kind to introduce focus and context glyphs 
for visual sorting of high-dimensional data. 



3. Sorting: Entities and Sort Keys 

Sorting is the most common analytical task which is used for 
re-organising entities consisting of single or multiple fields. 
The objectives of sorting can be classified into the following: 

• Ordering - arranging entities of the same type, or class 
into some ordered sequence. 

• Categorizing - grouping or labelling entities with similar 
properties through sorting. 

A sort operation can be performed based on one or more 
attributes. We describe such attributes as sort keys. In more 
general form, let us consider the set of objects or entities 
E = (e\,e2, ■ ■ ■ ,e s ), each containing a set of attribute keys 
K — (k\,k2,...,k n ). This defines a n-dimensional attribute 
space which governs the sorting process. Thus, e,- is a n-tuple 
or contains a n-tuple (as e,- may have additional information 
such as a video clip). For example, a group of entities E may 
be classified as a pack of cards (52 entities) which is sortable 
by keys A", such as card type (e.g., spades, clubs, diamond, 
and hearts), colour (e.g., red or black) or by value (1-13). 

In order theory, we can specify two types of ordering re- 
lations: a weak (non-strict) order denoted by "<" , or a strict 
ordering "-<". These two properties characterize the mathe- 
matical concept of linear ordering [Knu98]. Given a subset 
of keys K 6 K, the goal of sorting is to arrange the entities 
e-, into an ordered set (a list) such that e\ < e\ < ... < ef. 
At the level of abstraction, sort keys as attributes can not 
be directly compared (i.e., by arithmetic =, and <, >), as 
they are essentially concepts. Hence, we introduce the no- 
tion f K : E y-¥ R, that maps the object space with context 
keys K to a real value such that for any entity pair, e;,e/, the 
ordering relation ef -< ef implies: 



f K (e i )<f K (e j ) V/J=l,2, 



i*J 



With additional semantics, one can define such a function 
f K to sort data (e.g., events) into more practical, or mem- 
orable orderings beyond common sorts (e.g., alphabetical), 
since f K could be an importance function. However, this 
may cause data to lose its perceived ordering at the analyti- 
cal level. We introduce glyph sorting as one solution for per- 
forming interactive sorting in visualization, where one goal 
is to use glyphs to sort the data. 



4. Design Principles of Sortable Glyphs 

Building on previous works [Ber83, War08a, MPRSDC12], 
we propose the following design principles for the creation 
of sortable glyphs to be used in interactive sorting as part of 
a visualization process. 

Typedness: Each variable in a multivariate dataset may be 
of a different data type. Typically, these are classified using 
the theory of scales [Ste46] by: nominal, ordinal, interval, 
and ratio. In addition, direction should be considered as an 
important data type in visualization [War08b]. Although hy- 
pothetically, we can map all data types to one or a few visual 
channels, such as length and size, it is more appropriate to 
use visual mappings that intuitively convey the underlying 
data type. For example, in Figure 2(a) it is clearer to deter- 
mine the underlying data types for each variable in the glyph 
from the top row (that illustrates greater emphasis) than the 
bottom row (that illustrates less emphasis). 

Visual Orderability: Some channels (e.g., size, greyscale 
intensity) naturally correspond to quantitative measures that 
enable a viewer to order different glyphs perceptually, while 
some others (e.g., an arbitrary set of shapes, or textures) are 
much more difficult for viewers to establish a consistent rule 
of ordering [War08b, War04]. Figure 2(b) shows two exam- 
ple glyphs depicting 8 variables of the same data type. It 
is easier to visually order the 8 variables in the top glyph, 
than the bottom glyph. Additional semantics can be attached 
to a visual channel such that it becomes visually orderable. 
For instance, scientists often make use of the colour spec- 
trum to determine the order of colours, which may not be 
natural to a child who is unfamiliar with this concept. In 
some cases, one may have to use a visual channel with very 
poor orderability such as metaphoric pictograms. The prob- 
lem can be alleviated by accompanying such visual channels 
with an additional channel that is more visually orderable. 
For example, different pictograms can be associated with a 
background of different greyscales, or a regular polygonal 
boundary with different number of edges. Alternatively, one 
may carefully design the pictogram set to make some com- 
ponents of pictograms orderable. For example, Maguire et 
al. designs a set of 7 pictograms with incremental number of 
components to encode levels of material granularity in biol- 
ogy [MPRSDC12]. 

Channel Capacity: We adopt this term from information 
theory to indicate the number of values that may be encoded 
by a visual channel. It is necessary to note that such a capa- 
bility value is not an absolute quality, as the number depends 
on the size of a glyph as well as many other perceptual fac- 
tors such as just noticeable difference [BF93], interference 
from nearby visual objects, or from a co-channel in an in- 
tegrated channel [She64, HI72]. From the glyph designs in 
Figure 2(c), we can clearly observe that the top glyph has a 
higher channel capacity since each bar can encode more val- 
ues visually than the radial lines below. It will always be de- 
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Figure 2: Variations of glyph design in accordance to the design principles of sortable glyph (a)-(h). For each principle, the 
top row depicts a glyph with greater emphasis and the bottom row depicts a glyph with less emphasis. 



sirable to use a visual channel with a higher capacity, though 
this is often in conflict with other requirements. 

Separability: There have been many psychology studies on 
the relative merits of separable and integrated visual chan- 
nels (e.g., [She64, HI72]). Maguire et al. discuss this re- 
quirement in the context of glyph design in [MPRSDC12]. 
We find that this requirement is particularly important to 
glyph sorting. For example, in Figure 2(d), the glyph below 
encodes 8 variables using 4 integrated channels. Each of the 
4 circles encodes two variables using size and greyscale in- 
tensity. Not only is the perception of individual channel af- 
fected by another in an integrated encoding, but also their or- 
dering may demand more cognitive load in order for a viewer 
to detach one channel from another (e.g., intensity and size). 

Searchability: For glyphs encoding high-dimensional mul- 
tivariate data, it is necessary to help viewers to search 
rapidly for a specific variable among many other vari- 
ables [War08b]. In Figure 2(e), for example, it will be much 
easy to search for a green variable than the 5th variable. 
Searchability is affected by many factors [HE12]. One dom- 
inant factor is the visual dissimilarity of individual channels. 
Hence searchability is closely related to typedness and sep- 
arability as mentioned above. It is also related to the spa- 
tial organisation of different visual channels such as group- 
ing and ordering, as well as design appearance of each vi- 
sual channel. In many cases, one has to introduce an ad- 
ditional visual channel, such as colour in the top glyph in 
Figure 2(e) to help differentiate different variables. Another 
factor is learnability, which is to be discussed below. 

Learnability: While legends are usually essential to glyph- 
based visualization systems, they cannot replace the need 
for careful glyph designs to help viewers learn and mem- 
orise the association between variables and visual chan- 
nels without constantly consulting legends. It is desirable 
for the appearance of a visual channel to be metaphorically 
associated with the semantic meaning of the correspond- 
ing variable [War08b, SJAS05]. One of the most effective 



metaphoric designs is to use pictograms. This design prin- 
ciple was demonstrated by Legg et al. [LCP*12] through 
the deployment of glyph-based visualization in sports. Fig- 
ure 2(f) shows two different levels of learnability, when for 
example one needs to encode the number of greeting cards 
in different categories. The glyph on the top row is seman- 
tically rich and is much easier to learn than that on the bot- 
tom row. However, not all glyph-based visualization can af- 
ford pictograms. These constraints can often be alleviated by 
making abstract metaphoric association, such as green for 
nature, renewable, safe, and so on. 

Attention Balance: In multivariate visualization, one com- 
mon task is to make observation of the "behaviour" of dif- 
ferent variables in relation to the variable(s) in a sorted order. 
While it is helpful to make each individual variable search- 
able [TCW*95, War08b], it is also necessary to avoid unbal- 
anced attentiveness among different channels. For example, 
the bottom glyph in Figure 2(g) features bright red indicators 
for some variables. When browsing different glyphs in visu- 
alization, these red triangles are dominant which may cause 
undesirable pop-out effects. 



Focus + Context: In multivariate visualization, it is usually 
difficult, often undesirable, to pre-determine what is the fo- 
cus variable and what is the context variable. Naturally, in 
glyph sorting, a variable that is associated with a sort key is 
considered as one of the foci. In some cases, the viewer may 
wish to consider another variable as a focus. Hence it is de- 
sirable for a glyph sorting system to support focus+context 
visualization by highlighting individual channels that are in 
focus. Straka et al. [SCC*04] demonstrates this design prin- 
ciple for glyphs in CT-angiography. This can be expensive, 
because in the worst case, each visual channel is accompa- 
nied with another channel as a highlighter. 

Labelling and Legends: Axis-labelling is an essential re- 
quirement for any sorting configuration for indicating sort 
keys [WGK10]. It enables the viewer to understand the con- 
text (e.g., frequency vs. amplitude in sound analysis) without 
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referring to the visualization itself. Bertin [Ber83] refers to 
this as external identification. Legends convey the relation- 
ships between variables and visual channels and its represen- 
tation for a given discrete or continuous value. This is often 
known as internal identification [Ber83]. 

These design principles are general guidelines that we 
consider when designing glyphs to be sorted interactively 
in visualization. However, they should not be treated as the 
absolute laws. Some cases may lead to conflicting require- 
ments when following some of these principles, or compete 
for limited capacity of visual channels for smaller designs. 

5. Interactive, Glyph-based Visual System 

In this section, we propose a glyph-based visual analytic sys- 
tem for performing glyph sorting which is outlined in Fig- 
ure 3. The system integrates two fundamental components: 
1) a glyph control panel for selecting and driving the sort- 
ing process in a visual manner, and 2) an Interactive, Multi- 
dimensional Glyph plot for viewing sorted results. 

5.1. Focus and Context Glyph-based Interface 

Our glyph-based, sorting system utilises a focus and con- 
text glyph-based user-interface for selecting sort keys [see 
supplementary video]. The interface provides two main ben- 
efits. It allows users to interactively control the sorting pro- 
cess by populating sort keys within the linked IMG plot in 
a visually intuitive manner. Secondly, the focus and context 
glyph gives a visual reference which allows users to rapidly 
identify and understand the attributes that drive the sorting. 

Sort keys are selected in the system by interactively click- 
ing on a visual component of the glyph. The selected visual 
attribute is then rendered into focus using opacity such that 
the data attribute is visually distinct from other attributes. 
This is an effective method for emphasising specific parts 
to the users attention in high-dimensional glyphs. Similarly, 
users can remove a sort key by clicking on a glyph com- 
ponent in focus and dragging it off the glyph to bring the 
attribute back into context. By linking the interface with 
the IMG plot, users are able to populate different sort keys 
in a visually intuitive manner. Furthermore, we incorporate 
tooltips into the interface to aid users with information on 
what attributes is visually encoded in each glyph component. 

5.2. Interactive, Multi-dimensional Glyph Plot 

Since ordering in a sorting plane is one of the most effec- 
tive and widely recognised representations for data analysis 
(e.g., scatter plot), we position the glyphs along the two pri- 
mary sorting axes. This forms the basis of our Interactive, 
Multidimensional Glyph (IMG) plot. Following the design 
principles in Section 4, populated sort keys are depicted as 
focus and context glyphs along each sorting axis respectively 
(see Figure 6 for example) coupled with a visual legend to 
illustrate how the data is ordered. The sort key priority can 
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Figure 3: A graphical pipeline illustrating the glyph sorting 
framework. It consists of four key steps: 1) visual mapping 
of data to glyphs. We propose general design guidelines for 
creating visually sortahle glyphs to support interactive sort- 
ing and multivariate analysis. Alternatively, a default glyph 
(e.g., Star glyph) is used. 2) integrating a focus and context 
glyph control panel for selecting multiple sort keys, 3) con- 
structing the glyph sorting tool which enables users to per- 
form high-dimensional sorting and interactively adjust var- 
ious display options and 4) visual representation of sorted 
results on a Interactive, Multi-dimensional glyph plot. 

be changed interactively by the user via double clicking on 
the sort key glyph, to either promote (using the left mouse 
button) or demote (using the right mouse button) the order- 
ing. We integrate a series of interactive tools to aid user ex- 
ploration: sliders for adjusting axis length, brushing tools for 
selecting glyphs, pan-and-zoom navigation for details on de- 
mand and viewing of additional information (e.g., a video, or 
image) that may be associated with a glyph. 

Visualizing glyphs on a 2D plane imposes additional chal- 
lenges. One perceptual problem is the order in which glyphs 
are rendered on the IMG plot. By default, glyphs are ren- 
dered sequentially as they occur in the dataset. Depending 
on the sorting parameters, these will cause different levels of 
overlap. To alleviate this, we incorporate the ability to sort 
the rendering order of selected glyphs. This enables the user 
to emphasise glyphs of greater interest for data exploration. 
In addition, we provide two display preferences as a user- 
option. Connectivity, for rendering lines that connect glyphs 
in order of a sorting attribute, and Mean Bars which displays 
the statistical average value of a sorting axis (if applicable) 
as a coloured band in each hierarchical axis bin. 
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5.3. Hierarchical Axis Binning 

In data-driven placement, sorting data by discrete variables 
is a typical operation one can perform. However, this often 
leads to an increase in level of overlap due to discrete posi- 
tioning in the constrained sorting space. Ward [War02] de- 
scribes a survey on distortion techniques (e.g., random jit- 
ter [Cle93]), a post-processing step which can be used to 
reduce visual clutter by incorporating noise into the glyphs 
position. A major concern with this approach is the level of 
distortion introduced can significantly change the interpreta- 
tion and integrity of the visualization. 

Hierarchical axis binning is a mapping function to alle- 
viate such a problem by representing multiple discrete vari- 
ables as regions as opposed to points. Encoding multiple di- 
mensions onto a single axis enables additional sorting func- 
tions (e.g., a continuous variable) to be mapped for relax- 
ing the positioning of glyphs along a bounded sub-region. 
Figure 4 illustrates our generalised axis binning algorithm 
at different levels of sorting which we demonstrate along 
one axis. However, our technique can be applied over mul- 
tiple sorting axes. Let L be the interval [L m j n ,L max ] and 
K = (k\ , hi, ■ ■ ■ , kn) be a set of sort keys we want to order 
the data by. We define our axis mapping function for a sin- 
gle key k as the following: 



h(e,L,k) ■■ 



max/* 



(1) 



The linear function first normalises the attribute key and 
maps this to the region L, such that if k is discrete and non- 
numerical (e.g., name), then max/ is equivalent to the car- 
dinality | \k\ | of the sort key. For higher order sorting, we ex- 
pand the region given by each discrete value hierarchically 
to map additional sort functions. Let us first denote the type 
of a key as k T , where T = {Discrete, Continuous}. Now 
suppose A 6 K is a ordered sequence of discrete and contin- 
uous sort keys. We apply the restriction A ■ ' = {a , ',..., a n " } 
such that 7] =^ 7} + i for i = 1,.. . ,n — 1, where the condition 
=^ is used to obtain a list where no continuous key directly 
precedes a discrete key for each sort key pair. With such an 
ordered list, we can define a hierarchical sorting function for 
mapping and relaxing points along discrete sub-regions re- 
cursively by Eq. 1 . This is generalised to the following form: 



H{e,L,A) ■. 



Y, h ( e > L i- a i) 



(2) 



where L,- is the interval at each level. At the sort level i = 1, 
our interval is already initialised (i.e., the axis length, where 
L\ — L). Thus, it is only necessary to determine each sub- 
region division at successive levels of sorting. The sub re- 
gions are defined as L, + i £ [— 5/+i , +<5;+l] sucn that: 



St 



+1 



2 max/* 



■ H, Me [0,1) 



(3) 
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Figure 4: Diagram illustrating hierarchical axis binning 
along one sorting axis. Intervals (or sub-regions) at each 
level of sorting can be sub-divided by different attributes 
where additional sort functions can be mapped. We note that 
the axis mapping can be applied along multiple axes. 

in which the coefficient /x is used to adjust the maximum 
length of each sub region. For ju = 1, adjacent sub regions 
touch (connected), while for fl > 1, our intervals begin to 
overlap. For visual representation, we set fl < 0.8 to allow 
significant gaps between each axis bin at all levels of sorting. 
Since our function is bijective, it follows that each data point 
is unique. Hence, the complexity of ordering glyphs with 
multiple sort keys both analytically and visually is reduced 
to sorting by a dominance relation (e.g., x and y coordinate). 

Given a ordered list of discrete and continuous keys, we 
can hierarchically build multiple axis bins to facilitate sort- 
ing of multiple functions. The user is able to interactively 
control the amount of spatial relaxation by adjusting two 
properties: the axis length and the width of axis binning. 
Each hierarchical axis bin size is altered by varying the sort- 
ing parameter /x which corresponds to each level of sorting. 

6. Case Study: Sports Event Analysis 

We demonstrate glyph sorting on a real-world application in 
sports event analysis. We have worked in close collaboration 
with the Welsh Rugby Union (WRU) to develop a software 
that allows for in-depth analysis of matches. First, we de- 
tail the process of mapping attributes to a sortable glyph. We 
then present a visual comparison of two matches which was 
conducted by analysts at the WRU. We discuss the knowl- 
edge and insight that has been derived as a result of glyph 
sorting and conclude the study with domain expert feedback. 

6.1. Visual Mapping of Sort Keys 

In sports performance analysis, coaches and analysts heav- 
ily rely on notational data [HF97]. This involves "tagging" 
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Sort Key 



Typedness Visual Channel 



Gain Ordinal Colour 

Event Nominal Pictogram 

Territory Start Position Interval Size 

Tortuosity Ratio Shape 

Number of Phases Ratio Enumerate 

Direction Direction Orientation 
Net Lateral Movement Ratio Length 

Time Ratio Location 

Phase Duration Ratio Length 

Team Identifier Nominal Colour 

Table 1: Table illustrating the set of sort keys in rugby event 
analysis. Each attribute is classified based on typedness, and 
the visual channel mapped to the glyph. Data attributes are 
ranked in order of importance from top to bottom. 



video footage with key events and semantic notations from 
which key performance indicators can be derived. Spatial 
tracking data is another source of information which analysts 
study as a separate field. However, without the semantic con- 
text, such data is meaningless and is often disregarded due to 
the deluge of data. We design glyphs that combine both no- 
tation and spatial data to be used for interactive sorting and 
visualization [see supplementary Figure A], Table 1 gives 
an overview of the set of attributes in rugby event analysis 
which are ranked in order of data importance based on end- 
user feedback. Following the design principles presented in 
Section 4, we describe the methodology of mapping rugby 
event data to visually sortable glyphs (see Figure 5). 

The goal of rugby is to carry a ball to the opposition try 
line. Gain is the term used for the distance gained towards 
the opposition try line as a result of free play. Although gain 
is naturally of interval type, conventions in rugby adopt an 
ordinal measurement (e.g., negative gain, minor variation, 
major gain). Thus, a discrete representation is needed. Since 
end-users make use of an existing ordered colour scheme, 
it is natural to map gain to this visual channel to support 
visual orderability, learnability as well as being searchable 
given the high visual priority of colour. The context in which 
gain is achieved is particularly important. These start events 
(e.g., from lineout, scrum, etc.) are nominal categories that 
classifies periods of play into more semantically meaning- 
ful groups. Here, the events are sorted by importance. We 
discuss previously in Section 4 the use of metaphoric pic- 
tograms for mapping such data [LCP*12]. Pictograms can 
often be arbitrary, in that their shape, size, colour will vary, 
thus having a low visual orderability. Using different inten- 
sities to draw each pictogram is one solution to establish a 
visual ordering, however, this may be misleading since event 
is discrete and not continuous. Instead, we design and or- 
der the pictograms according to their relative greyscale pixel 
count which is more appropriate for our study. Typically as- 
sociated with a start event, is whether that event resulted in 
point scored (i.e., the end event). These glyphs should be dif- 




Njmber of phases 



Time and event duration 



Tortuosity mapped to 
boundary curvature 



Territory start position 
mapped to radius 



Contour colour as 
team identifier 



Net Lateral movement 
as arrow width 



Figure 5: Components and visual channels of the glyph. 

ferentiable to the viewer. Therefore, we use a coloured halo 
effect to enhance the attention-balance of such glyphs. 

In rugby, the pitch is divided into key areas known as ter- 
ritory, which describes the spatial property of an event. The 
territory start position gives an indication at how far an event 
occurs from the opposition try line. Given that visual separa- 
bility of variables is a key requirement in glyph sorting, we 
avoid overloading a single channel (e.g., colour) by encoding 
this attribute using size. Using the glyph template described 
in [MPRSDC12], we map this to the radius of a transpar- 
ent, external grey silhouette. Size is a suitable mapping for 
ordering quantitative variables (i.e., interval and ratio) and 
also yields a high searchability due to visual pop-out, mak- 
ing this ideal for attributes of greater importance. The addi- 
tional channel capacity introduced by the silhouette enables 
us to encode a varying line curvature along the contour for 
displaying the tortuosity of the ball path. Semantically, the 
line curvature resembles the tortuosity or shape of the ball 
path, which makes this easier for users to infer or remember. 

A single path (or ball-in-phase), consists of a series of 
waypoints and path segments. In rugby, these waypoints and 
segments correspond to the number of phases. A simple and 
effective mapping for such discrete data is to use a enumera- 
tive representation due to its natural ordering. We depict the 
enumerate inside an arrow head which is oriented according 
to the resulting ball direction. Since orientation has weak 
learnability, we incorporate metaphoric cues i.e., a compass, 
by positioning the arrow head along a circle to make this 
more memorable to the end-user. We map arrow width to 
net lateral movement which indicates the relative lateral dis- 
tance travelled. Since net lateral movement and direction is 
co-related, it is sensible to couple both variables together. 

Another data coupling is time and phase duration which 
describes the temporal period in which the event occurs. Be- 
cause both attributes are of ratio type and continuous, it is 
possible to combine such data using an integrated encoding, 
for maximising channel capacity. We represent time using a 
clock visual metaphor, where time and duration is mapped to 
location (or orientation) and length of the time handle. The 
semantics of a clock is used to enhance the visual orderabil- 
ity property of time. In order to facilitate aspects of our sort 
key visual mappings, we adopt a circular-based glyph de- 
sign (Figure 5). The final attribute we map is team identifier 
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Figure 6: Visual comparison of two rugby matches using the IMG plot. Top: Match 1 (Ml). Bottom: Match 2 (M2). For each 
match, the glyphs are ordered using three sort keys: Gain versus Start Event and Tortuosity. Mean bars are also displayed as a 
user-option to provide additional statistical information. 



(i.e., home or opposition), which we depict by colour-coding 
the inner contour. We follow the general convention used in 
sport for distinguishing two teams by mapping red and blue 
to the teams respectively. This enables sporting domain ex- 
perts to be more familiar with the glyph concept which re- 
lates to learnability and visual search. 

6.2. Visual Comparison of Two Matches 

Analysts are normally tasked with watching multiple match 
videos to identify the occurrences of key performances. This 
is laborious and time-consuming, and even current tech- 
niques such as notational analysis do not allow the analysts 
to discover new insight but merely review what has been 
previously recorded. As part of our evaluation, we compare 
glyph-based visual analytics for analysing the performance 
of a single team in two different rugby matches as shown in 
Figure 6. Match 1 (Ml), involves two evenly matched teams, 



resulting in a closer point score differential. This is com- 
pared to Match 2 (M2) where one team proved to be more 
dominant. Both matches are taken from the World Cup 201 1. 
By using visual analytics, the domain experts are interested 
to see how the two matches compare and for investigating 
why the outcome of the two matches are so different. 

We presented the software to the analysts and explained 
the usability prior to letting the analysts explore the two 
datasets. One topic of interest is the relationship between 
gain and tortuosity, i.e., whether the strategy of working the 
opposition (high tortuosity) resulted in greater gain. Sorting 
the glyphs by the two attributes reveals a uniform gaussian 
distribution of glyphs in both matches [see supplementary 
video], A clear observation, is the significantly lower av- 
erage tortuosity in M2, indicated by the greater spread of 
glyphs and overall shift along the tortuosity sorting axis. 
This shows that it requires less effort to make sizeable gains 
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in each phase and thus, attacking the opposition more di- 
rectly can yield larger benefits. From a glance, the analyst 
identifies many different event types (i.e., pictograms) ap- 
pearing within the cluster in the visualization. This directed 
the user to inspect how start event would affect the ordering 
of the glyphs via an alternative sorting strategy. 

Figure 6 illustrates the comparison of the two matches 
where the user sorts the glyphs based on three attributes: gain 
versus start event and tortuosity. One new feature not previ- 
ously observable, is the variation of start events that resulted 
in points scored which is depicted by glyphs highlighted in 
purple. It is clear in Ml, that most points are scored from 
lineouts. In comparison, the other match exhibits a more 
uniform distribution of point scoring events. From this, we 
can hypothesise specific strengths and weakness of different 
teams. The statistical information displayed by mean bars is 
useful for analysing and deriving new key performance indi- 
cators. For instance, phases from turnover provide the most 
average gain across both matches as shown by the highest 
blue bands in each hierarchical axis bin. Thus, the number of 
turnovers is one key indicator that influences the team per- 
formance. Subsequently, scrums is the next most effective in 
Ml, whereas lineouts proved more successful in M2. 

Under the new sort operation, the analysts discover a new 
data trend that is present in Ml and not the other (see Fig- 
ure 6), where the glyphs appear within each axis bin along a 
linear line from top left to bottom right. This indicates that 
the team achieved more gain whilst attacking the opposition 
directly, which decreases respectively with higher tortuosity. 
At first, this was not what the analysts expected. By visually 
analysing the glyphs in the upper left cluster, we found the 
events to occur largely within the defensive third as shown 
by the shorter grey silhouette on each glyph. For a greater 
level of detail, the analyst studies the sorted video clips that 
is associated with each glyph, to find the cause of higher gain 
is a result to the team kicking the ball forward out of defense. 
Although kicking the ball results in greater gain, this comes 
at a cost of losing ball possession which is crucial. 

The analysts found the trends to be insightful for explain- 
ing strategies against different oppositions. Tactically, the 
visual patterns observed in M2 describes a more offensive 
game plan which is carried out each time the team regained 
ball possession. In comparison, Ml shows a clear distinc- 
tion between offense and defense, where the team selectively 
chose key moments (e.g., pitch position) to attack the oppo- 
sition. The information correlates well with the analysts un- 
derstanding since mistakes against stronger oppositions (i.e., 
Ml) comes with higher risk which can impact the outcome 
of a match. One further observation visible in M2 is shown 
by the ordering of glyphs in the turnover event category, in 
which the variable gain increases with tortuosity. Such a pat- 
tern indicates the opposition defence tiring as the home team 
attacked the ball, creating a prospective scoring opportunity. 



6.3. Domain Expert Review 

The development of the work has been an iterative process 
in close collaboration with the Welsh Rugby Union (WRU), 
spanning over 12 months. From inception of the idea, it was 
clear that the analysts want to be able to interrogate their 
data in a more complex nature than previously available in 
order to gain new insight. The introduction of spatial data 
into visual analytics has meant that this is now achievable 
and has been used to derive novel information intuitively, 
[name remove for review] of the WRU performance anal- 
ysis team provide valuable feedback on the usage of glyph 
sorting within rugby performance analysis. 

"The strongest element of the system is the ability to in- 
teractively sort vast quantities of data according to multi- 
ple attributes for revealing trends or groups of data. Your 
eyes are instantly drawn to those patterns. In our current 
practice, getting the data and generating charts (through 
spreadsheets) is very time consuming. Once a chart is plot- 
ted, we often get "What if we take this variable into ac- 
count?", which then requires us to go back to the raw data 
and process it all again. Where as with this, we can navigate 
the data much more effectively. The visualization is insight- 
ful for giving an overview of a match. Sorting the data gives 
good visual cues for pointing us in the right direction and 
being able to look in detail at the associated videos helps to 
clarify and explain what those trends are." 

The feedback received from the WRU analysis team 
proved to be very encouraging. It confirms that the use of 
glyph sorting can significantly enhance the effectiveness of 
glyph-based visualization. By integrating glyphs into the 
sorting process and linking this with multiple video footage, 
the analyst is able to derive new underlying phenomena from 
a match. In particular, the domain expert feel that such a 
system is highly beneficial in their workflow for post-match 
analysis, where the insight obtained from sorting is useful 
for formulating strategies against different oppositions. 

7. Conclusions 

In this work we have developed a glyph-based sorting frame- 
work for interrogating and interpreting large multivariate 
data. We have demonstrated the technique by applying it 
to sports performance analysis, where a variety of contin- 
uous and discrete data forms are incorporated into a visually 
sortable glyph design. Glyph sorting is an effective means 
for multivariate analysis and can be used to enhance the us- 
ability of glyph-based visualization and enrich the users with 
alternative sorting strategies for revealing trends. Our sort- 
ing framework enables the analysts to derive new insight as 
a result of high-dimensional sorting that was previously not 
observable with existing techniques. 
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